Standards for microarray data.
نویسندگان
چکیده
ONE OF THE UNDERLYING PRINCIPLES OF scientific publication in peer-reviewed journals has been the requirement that the authors make available the data and materials necessary for a reader to reproduce the experiment or analysis and to determine whether the data support the authors’ conclusions. In many instances, such as DNA sequence or protein structure data, this has evolved into the requirement that the data underlying each published report be deposited in an appropriate international database. For microarray experiments, simply defining the appropriate data has been a challenge, because the large quantity of data generated in each experiment and the typical complexity of the ancillary information needed to interpret the results are unlike anything that has yet faced the biological research community. Databases to hold microarray data and the tools to annotate them properly are under development. As an interim solution, we have described the types of data that are necessary to reproduce and interpret a microarray experiment. It should go without stating that this information is only of value as long as it is available, so every effort should be made to provide stable access to published data until such time as it is available from a public database. The members of the Microarray Gene Expression Data (MGED) (www.mged.org) society have been working over the past few years to solicit community input in developing standards for the publication of DNA microarray data. The authors of this guide and the MGED society as a whole represent a large cross section of the scientific community that has worked with microarrays. We are convinced of the importance of the issues described and strongly urge journals to use these recommendations when deciding whether to publish a paper using microarray data. In December 2001, we published a commentary in which we described MIAME—the Minimal Information About a Microarray Experiment (1). MIAME is presented as a proposed standard for representation of array data that would be sufficient to allow readers of published reports to replicate the analysis presented and to facilitate the development of novel methods of data analysis by providing access to necessary primary data. Community response to MIAME was favorable, and many instrument manufacturers, software developers, and international databases moved to adapt their systems to capture and manage MIAMEcompliant data. However, by far the most common request from the community has been for a brief set of guidelines that could be used by authors, editors, and referees to try to meet the MIAME data standards. These requirements can easily be met by adequately describing the experiment, the materials and methods used, and either (i) a relatively simple supplementary Web site or (ii) submission of this information to one of the public repositories [ArrayExpress (www.ebi.ac.uk/arrayexpress) or GEO (www.ncbi.nlm.nih.gov/geo/)]. Reviewers and editors should strive to help authors meet these requirements and should ensure that, if a publication cannot meet them, there are sound reasons. This document in no way attempts to eliminate the need for editors or reviewers to use their judgment on both the appropriateness of the presentation and the validity of the report, but rather provides a guideline for them in their evaluation of whether or not a manuscript provides as much information as necessary for others to replicate and interpret the analysis presented. The proposed guidelines, including a checklist for ease of use, are available at www.mged.org/Workgroups/MIAME/ miame_checklist.html. ON BEHALF OF THE MGED: CATHERINE A. BALL,1 GAVIN SHERLOCK,1 HELEN PARKINSON,2 PHILIPPE ROCCA-SERA,2 CATHERINE BROOKSBANK,2 HELEN C. CAUSTON,3 DUCCIO CAVALIERI,4 TERRY GAASTERLAND,5 PASCAL HINGAMP,6 FRANK HOLSTEGE,7 MARTIN RINGWALD,8 PAUL SPELLMAN,9 CHRISTIAN J. STOECKERT JR.,10 JASON E. STEWART,11 RONALD TAYLOR,12 ALVIS BRAZMA,2* JOHN QUACKENBUSH13 1Department of Genetics, Stanford University, Stanford, CA. 2EMBL–European Bioinformatics Institute, Cambridge, UK. 3Clinical Sciences Centre, Imperial College, London. 4Bauer Center for Genomic Research, Harvard University, Cambridge, MA. 5The Rockefeller University, New York, NY. 6Universite D’Aix-Marseille II, Marseille, France. 7University Medical Center, Utrecht, Netherlands. 8The Jackson Laboratory, Bar Harbor, ME. 9University of California at Berkeley. 10University of Pennsylvania, Philadelphia, PA. 11Open Informatics, Albuquerque, NM. 12Center for Computational Pharmacology, University of Colorado School of Medicine, Denver, CO. 13The Institute for Genome Research, Rockville, MD. *To whom correspondence should be addressed. EMBL Outstation–Hinxton, European Bioinformations Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. E-mail: [email protected]
منابع مشابه
Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملIntegration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملThe False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Science
دوره 298 5593 شماره
صفحات -
تاریخ انتشار 2002